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Abstract 

We discuss an extension of the massively parallel cosmological simulation code GADGET-2, which now enables investigation 
of multiple and distinct gravitational force laws, provided they are dominated by a constant scaling of the Newtonian 
force. In addition to simplifying investigations of a universally modified force law, the ngravs extension provides a 
foundation for state-of-the-art collisionless cosmological simulations of exotic gravitational scenarios. We briefly review 
the algorithms used by GADGET-2, and present our extension to multiple gravities, highlighting additional features that 
facilitate consideration of exotic force laws. We discuss the accuracy and performance of the ngravs extension, both 
internally and with an unaltered GADGET-2, in the relevant operational modes. The ngravs extension is publicly released 
to the research community. 
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V-body cosmological codes are traditionally designed to inves¬ 
tigate a single gravitating species interacting via the Newto¬ 
nian force law. There exist viable extensions to General Rel¬ 
ativity [1], however, which predict weak-field, slow-motion lim¬ 
its featuring distinct gravitational force laws between distinct 
particle species. To enable investigation and constraint of these 
theories with available astrophysical data, a necessary first step 
is to extend an V-body simulator to handle distinct gravitating 
species. 

Solution method: 

The massively parallel Barnes-Hut tree. Fast Fourier Trans¬ 
form, and sorting routines of the versatile and well vetted N- 
body simulator[2] GADGET-2 were extended to support D dis¬ 
tinct gravitationally interacting species. The tree implementa¬ 
tion now vectorizes over each species’ monopole masses and po¬ 
sitions, the Fourier routines now handle active and passive grav¬ 
itational masses separately, and the sorting routines now group 
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all particle data by type. The appropriate TreePM adjusted 
forces are computed via FFT and tabulated before runtime. 

An additional file was introduced allowing the user to spec¬ 
ify all gravitational interactions: real space, Fourier space, 
and lattice summation corrections. To improve monopole ap¬ 
proximations in scenarios where the scale of the gravitational 
interaction depends on the mass itself, an optional tracking of 
the number of bodies contributing to any particular monopole 
approximation has been written. 

Restrictions: 

Mesh methods with non-periodic boundary conditions have 
been disabled. Force laws with mass dependent scale lengths 
are not amenable to the implemented Fourier methods (or even 
the traditional[3] Fourier approach). Nodes containing highly 
heterogeneous collections of particles with different mass de¬ 
pendent scale lengths may not be well-approximated, even with 
the additional tracking introduced. The collisional “gas” species 
can only interact via a single gravitational force law. 

Unusual features: 

The extension allows simultaneous consideration of at most six 
distinct central forces, where each is a sum of bounded, mono¬ 
tonic, gradients of radially modulated Newtonian potentials. 
This will serve as a common platform for model-dependent ad¬ 
justments to the cosmological background evolution. 

Additional comments: 

Data file format is identical to that of GADGET-2. Conhguration 
file format is unchanged, save for the addition of required bind¬ 
ings between particle species and gravitational type. To install 
ngravs, first install GADGET-2 (available at http://www.mpa- 
garching.mpg.de/gadget/ ), then replace the contents of the 
Gadget2 subdirectory with the files included in the provided 
tarball. Alternatively, one can clone the github repository 
https://github.com/kcroker/Gadget-2.0.7-ngravs and provide one’s 
own configuration and initial data files. 

Running time: 

Typical running times are < 2Dx those of GADGET-2, where 
D is an integer between 1 and 6. 
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1. Introduction 

At present, there is substantial evidence that most of 
the mass and energy within our universe is non-luminous. 
Big bang nucleosynthesis and baryon acoustic oscillations 
strongly constrain [T] the fraction of non-luminous matter 
with respect to the luminous component from the radia¬ 
tion dominated era onward. Yet, while they and other ev¬ 
idence, such as the Bullet Cluster [2], strongly suggest that 
the dark fraction can be well-approximated by pressure¬ 
less ideal fluid, the composition and precise distribution of 
this dark matter is far from clear. While numerous par¬ 
ticle dark matter candidates 0 are theoretically popular 
at present, conflicting exclusions from possible detections 
and increasingly stringent constraints from lack of direct 
detection [3] at cosmologically desirable mass scales con¬ 
tinue to motivate investigation in novel directions. 

With the growing wealth of high-precision astrophys- 
ical datajlHZ] and the absence of a “bottom-up” under¬ 
standing of dark matter, 7V-body methods[8j have become 
essential for comparison against highly non-linear theoret¬ 
ical predictions in structure formation over many decades 
of spatio-temporal scale. In the past twenty years, the so¬ 
phistication of such cosmological simulations both in phys¬ 
ical scope and technical implementation has undergone 
unprecedented growth[51-[TT]. While significant literature 
exists on the use of Wbody methods to explore standard 
structure formation scenarios, there exist proposed exotic 
scenarios that are also amenable to iV-body methods. In 
a particularly intriguing case, Hohmann & Wohlfarth [12] 
proposed to model Dark Energy as a repulsive gravita¬ 
tional interaction between matter species belonging to dis¬ 
tinct copies of the Standard Model. This model should 
make very strong predictions for weak-lensing observations 
and could be readily adapted to the dark matter prob¬ 
lem. This motivates the extension of existing numerical 
simulation tools to investigate and constrain more general 
cosmological models. 

One well-established and versatile simulation tool is 
gadget-2: a massively parallel code with extensive mem¬ 
ory and speed optimization employed both algorithmically 
and architecturally[T3|. GADGET-2 has been extended to 
consider three distinct gravitationally interacting species 


by Baldi et. al. El to enable investigation of coupled dark- 
energy cosmologies. Unfortunately, their extension is fo¬ 
cused on particular cosmological models and not publicly 
available. 

In the following, we discuss our independent and aug¬ 
mented implementation of D distinct gravitationally inter¬ 
acting species in GADGET-2. To facilitate investigation of 
models amenable to multi-species treatments by individual 
researchers, our publicly released ngravs extension permits 
convenient definition of distinct gravitational force laws 
and optionally provides additional data which can be used 
to improve force accuracy under certain scenarios. Our 
primary aim is to facilitate large scale structure investiga¬ 
tions of multi-species models, with considerable freedom 
in the precise form of the gravitational force laws. Our 
initial use of ngravs will be the constraint of such models 
through predicted galaxy power spectra. 

We assume the reader is familiar with the goals, con¬ 
struction, and operation of modern A^-body codes. Through¬ 
out this paper, D will always refer to the number of dis¬ 
tinct gravitationally interacting species, while N will refer 
to the number of bodies considered in any particular simu¬ 
lation. Units will be such that G = c = 1. Stock will refer 
to the unaltered GADGET-2.0.7 code, while ngravs will re¬ 
fer to our augmented version of this same code. 

2. Implementation 

The algorithms employed by GADGET-2 to compute 
collisionless forces are a Barnes-Hut tree walk and, option¬ 
ally, a particle mesh (PM) computation. To construct the 
tree, the simulation volume is recursively halved until each 
particle resides within its own leaf. Interparticle forces are 
then computed for any specific particle by recursive traver¬ 
sal of the tree, halted when a monopole approximation of 
the force from all deeper branches satisfies a user-specified 
force accuracy “opening criterion.” PM forces are com¬ 
puted by interpolating particle positions to a mass density 
defined on a regular grid, performing a Fourier transfor¬ 
mation, convolving with the fc-space Greens’ function, and 
inverting the transform. The resulting potential is then 
interpolated back to forces at all particle positions. The 
tree algorithm may be used exclusively, or can be combined 
with the PM algorithm in a hybrid arrangement (TreePM) 
where the tree is used to compute short range forces, and 
the PM algorithm used to rapidly compute distant con¬ 
tributions. In TreePM mode, the A:-space Greens’ func¬ 
tion is Gaussian filtered, and the corresponding truncated 
short range force is summed with a spatially restricted 
Tree walk. The GADGET-2 code is engineered so that tree 
and PM computations are nearly decoupled from the more 
intricate collisional force computations, time integration, 
parallelization decomposition, and 10 routines. This per¬ 
mits a targeted and straight-forward extension to multiple 
gravitational interactions. 

gadget- 2 implements 6 distinct particle types, includ¬ 
ing a “gas” type which features additional force compu- 
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Table 1: Time averaged D = 1 and D = 3 ngravs tree computa¬ 
tion performance with Newtonian interactions, normalized to stock 
runtimes. N represents particles processed per second, adjusted for 
interprocess communication delays. Subscript labels indicate stock 
(stc) or ngravs {D = 1,3). Note that the purely collisional gas sphere 
cannot be tested for D > 1 (see § |2.5| |. 


Test case 

(Nd^i) / {Nstc) 

{Nd=3) / {Nstc) 

Galaxy collision 

0.706 

0.334 

ACDM gas 

0.779 

0.405 

Gas sphere 

0.759 

- 


tations from collisional dynamics. Let 1 < Z? < 6 be 
the number of distinct gravitational species. For parti¬ 
cle types the following map is established between 
0<i^j<K = D—1 generalizing Newton’s law of gravi¬ 
tation 


—» m ' ■ 

F{m„Mj,r,Nj_) = - f^Jir,NJ_)r (1) 

Here each is dominated by a constant scaling of the 
Newtonian force, with specific forms detailed in §2.1| They 
depend on the separation of masses r, the active mass Mj, 
the passive mass rrii, and the number of source particles 
N± contributing to the monopole approximation employed 
by the Barnes-Hut tree algorithm. In the direct force case, 
Nj_ = 1. 

As the standard Newtonian force diverges as r —>■ 0, 
in order to maintain numerical accuracy under reasonable 
timesteps, GADGET-2 artificially smooths to zero the grav¬ 
itational interaction below some (type dependent) length 
scale. In general, distinct “softenings” analogous to 
0 must also be specified. We have implemented the 
above mappings through function pointers, enabling much 
model-dependent code to be conveniently populated within 
a single location. Thus, at the user’s option, tables for the 
gravitational potential, fc-space Greens’ functions, and lat¬ 
tice sum corrections may also be specified as desired. 

Pure tree computations have been extended to both 
periodic and non-periodic modes, while TreePM computa¬ 
tions have been extended only to periodic mode at present, 
removing the possibility of secondary PM “zoom” simula¬ 
tions from the ngravs extension. For justification of this 
design choice, we direct the reader to § |2.5[ 

Performance comparisons with respect to stock can be 
found in Table [l] where we find an ~40% increased run¬ 
time due to additional overhead within the tree algorithm. 
It should be noted that the simulation specific performance 
of ngravs, apart from this constant scaling, is unchanged 
from that of stock. Thus, in D = 1 cosmological scenar¬ 
ios, the ngravs extension enables convenient investigation 
of a single, globally modified force law; if one can afford 
modestly longer runtimes. More importantly, the struc¬ 
tures containing data on all N simulation particles are 


unchanged and so the favorable memory storage require¬ 
ments of gadget- 2 are maintained. 


2.1. Specific forms of fij 

The Barnes-Hut algorithm makes assumptions about 
the nature of the force law which must be honored to 
maintain accuracy of the algorithm. This is not a seri¬ 
ous impediment, as arbitrary forces are not relevant for 
cosmological investigations. We now develop properties of 
the fij which will permit investigation of a very wide range 
of possible scenarios, while simultaneously maintaining the 
established force accuracies of GADGET-2. 

Investigations of the gravitational force between bary- 
onic matter strongly constrain this interaction to an inverse- 
square law (ISL)[T5]. Analogous constraint for dark mat¬ 
ter, however, must presently come from large scale astro- 
physical data. This leaves margin for speculation and mo¬ 
tivates modification to the gravitational interaction of the 
dark component. Two popular deviations from the bary- 
onic ISL, in the notation of |15j . are the Yukawa-like 


F{r) = GMto-Z- 
dr r 


— T 

C + a exp ( — 


and power-law 


F{r) = GMto-Z- 
dr r 




M-l 


r. 


( 2 ) 

( 3 ) 


We introduce the parameter G G {0,1} to permit consid¬ 
eration of a “pure Yukawa” law as could present in massive 
gravity theories |lfij or a harder power law, which would fol¬ 
low from dimensional considerations if point-like dark mat¬ 
ter interactions were well-approximated by a Poisson law 
in greater than 3 non-compact spatial dimensions. Note 
that Equations ([^ with G = 1 and ([^ strengthen the 
force law at small scales. 

One may also consider the usual Newtonian force, but 
construct “effective” point-like force laws for extended mass 
distributions which remain rigid on dynamical timescales. 
An example could be a dark matter “particle” sourced by a 
non-pointlike mass density p{r), stable on timescales rel¬ 
evant to simulation length. Such objects could be used 
to reduce particle count in a simulation, or to investigate 
novel approaches to the missing satellites [T7j and “core 
cusp” [IHl problems. These objects would interact with the 
usual baryonic matter via the following force 


F{r) 


GMm-Z 

dr 



pi' 


'')r'^dr' 


( 4 ) 


where the integral must remain finite as r —)■ oo. Note 
that, in contrast to the above force laws with G = 1, Equa¬ 
tion Q diminishes the force law at all scales. 

Equations 0, 0 , and 0 all take the form of the 
gradient of a modulated Newtonian potential 

- d Msy(r) „ d 

Fijfr) = m- - - -r = -m-—Vij(r)r (5) 

dr r dr 
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where M is the active gravitational mass, Sij (r) is bounded, 
positive, and limr_>.oo (r) monotonically approaches a 
constant value. In the following discussion, we restrict our 
consideration to this class of force laws. We emphasize 
that M and must appear as a multiplier in order to main¬ 
tain the superposition required by both the Tree and PM 
methods. It is assumed that the Equivalence Principle 
holds[TS], and so m must also appear as a multiplier. We 
will now drop this passive/inertial mass and consider the 
accelerations a^- when convenient. 

For cosmological simulations of metric theories of grav¬ 
ity, the evolution of the scale factor is determined by the 
full field equations. Contributions to the weak field equa¬ 
tions atop this Robertson-Walker background are then de¬ 
termined by first order perturbation theory. While one can 
now investigate significantly more general force laws below 
the horizon scale, if the background expansion is signifi¬ 
cantly altered from the Friedmann equations, additional 
adjustment to the timestep routines may be required to ob¬ 
tain meaningful numerical results on cosmological scales. 
Such modifications are beyond the scope of the present 
work. 


2.1.1. Timesteps 

The determination of a suitable timestep for the nu¬ 
merical integration is a subtle problem!^. On the one 
hand, it is desirable to use the largest possible timestep 
to speed the simulation, but one must do so while main¬ 
taining force accuracy. In general, the larger the force, the 
smaller the required timestep. Though there are many 
approaches to determining a timestep |2Ij. most require 
knowledge of higher derivatives and would require signif¬ 
icant departure from the existing GADGET-2 code. We 
thus implement a minimal departure from the algorithm 
of gadget-2. This guarantees the established force accu¬ 
racy of gadget-2 and facilitates comparision with studies 
performed with stock. 

In the simplest scenario, GADGET-2 maintains sym- 
pletic time evolution by synchronously stepping each par¬ 
ticle by 


At 


grav 


= min 


At 


max 5 



( 6 ) 


with rj a user-specified dimensionless accuracy, a the parti¬ 
cle’s acceleration, and e a user-specified gravitational soft¬ 
ening length. The maximum timestep Atmax is either spec¬ 
ified by the user, or determined in cosmological simulations 
by enforcing that rms displacement should be well below 
mean particle separation. Though the nonphysical e enters 
the timestep computation, this timestep has been shown 
to be robust for Newtonian interactions through numerical 
investigations [52] . 

To guarantee the already established force accuracy of 
gadget-2 without alteration of the time integration algo¬ 
rithms, we note that the permissible fij exhibit an 
such that the accelerations Uij are bounded by 
provided one remains outside of an appropriately defined 


Figure 1: Average particles processed per second N given D species, 
normalized to D = 1. Curves are separate one-parameter fits cx~^ 
to the data. Note the decrease proportional to D~^ as expected 
in both pure tree and TreePM modes. Rather large error bars for 
the TreePM scenario have been omitted for clarity as the relevant 
quantity is average performance. 
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softening scale. Thus, we may modify Equation iby a 
constant scaling 


Atgrav = min 


AA 


2r]e 


1/2 


(7) 


This works because any acceleration smaller than that due 
to this scaled Newtonian force will produce smaller adjust¬ 
ments to any trajectory per unit time, and thus will be 
tracked to greater precision by the existing timestepping 
algorithms. 


2.2. Tree forces 

In keeping with our approach, the existing tree routines 
were extended simply by vectorizing over the monopole 
mass centers and velocities. This additional data increases 
tree memory consumption by ^ 0.3(D — 1) from stock. In 
order to accommodate more exotic force laws for which 
the interaction scale is related to the active gravitational 
mass in more complicated ways, the tree structure and 
construction were augmented to optionally track N± for 
all contributing types. This quantity can then be used to 
suitably correct computation of the monopole moment. 

During force computation, tree walks may be signif¬ 
icantly optimized by suitable choice of opening criteria. 
For collisionless force computations in stock, the tree is tra¬ 
versed at most twice, once for the usual particle-particle in¬ 
teraction, and once again for any periodic correction. The 
latter walk can proceed more efficiently, as the opening cri¬ 
teria is significantly different: corrections to near particles 
are very small. We considered performing D separate tree 
walks, but stock employs a very effective relative opening 
criteria during the tree walk, where the acceleration pre¬ 
viously computed is compared to a Newtonian estimate of 
the new acceleration to determine whether to traverse the 
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branch. These previous accelerations, however, are stored 
on a per-particle basis and the amount of memory required 
to store D — 1 additional accelerations produced an unac¬ 
ceptable increase in memory consumption. It was decided 
that, since many alternative force-laws cannot deviate too 
strongly from the Newtonian force, any gains in speed due 
to decreased depth within the tree would not be offset by 
the additional memory requirement. Instead, we vector¬ 
ized within the single tree walk that GADGET-2 already 
performs, and continue to employ the Newtonian relative 
opening criteria. This opening criteria is conservative, pro¬ 
vided that the alternative force-laws are dominated by the 
usual Newtonian interaction. This results in slightly im¬ 
proved force errors that can be easily understood: in a 
limit where the mass distribution is characterized by N 
distinct monopoles, one simply regenerates the exact force. 
Overall, the tree walk runtime increases by a factor of D. 
Indeed such behavior is found in Figure where the av¬ 
erage number of particles processed per second decreases 
as \/D. 

2.3. Lattice corrections 

Stock periodic computations may optionally proceed in 
pure tree mode with the method of Ewald summation|23j. 
The Ewald technique is a specific application, to the ISL, 
of methods designed to transform poorly convergent sums 
of periodic images in r-space into rapidly convergent sums 
in Ic-space. These methods belong to the broader topic 
of lattice sums [23] which are, in general, challenging to 
compute. In terms of error functions, computation can 
be reduced to auadrature|25j. with the usual caveats that 
apply to numerical integration, ngravs can tabulate given 
corrections from an infinite image lattice for each of di¬ 
rect forces, and interpolate to actual particle positions in 
the same manner as stock. In practice, the lattice correc¬ 
tion is usually specified as a consistency check against the 
Fourier computations, and not used in actual simulations. 

2 . 4 . PM forces 

In order to minimize surface to volume ratio, stock 
does not attempt to overlap local PM computation with 
local particle distribution. Instead, density data is ex¬ 
changed between all parallel processes according to an op¬ 
timal slab decomposition determined by the Fast Fourier 
Transform (FFT) routines [55], and the resulting potential 
is exchanged back. Since the identities of all the particles 
which contribute to any given slab are unknown, the PM 
routine must iterate times (instead of D{D + l)/2 and 
exploiting symmetry), so that each gravitational type can 
be both the passive and active gravitational mass. Though 
more sparse, the exchanged data is of the same dimension 
and the runtime increases by a factor of D^. This per¬ 
formance degradation is of little concern, however, as the 
FFT runtime continues to be heavily subdominant to that 
of the Tree algorithm, as is clear in Figure [^ 

In gadget-2, one may optionally enable Peano-Hilbert 
sorting of particle data on each local processor. This was 


found by Springel[T3] to often give substantial (but ar¬ 
chitecture dependent) improvements in runtime as spatial 
proximity translates to memory proximity. To enable pro¬ 
cessing of the entire local particle content with only a sin¬ 
gle traversal of the data, if I? > 1 and TreePM mode is 
enabled, we have implemented an additional sort by gravi¬ 
tational type before the Peano-Hilbert sort. Subsequently, 
each gravitational type is then Peano-Hilbert sub-sorted. 
As is done in stock, both sorts proceed so that only one 
reordering of the particle data memory is required. For 
compatibility with the collisional code of GADGET-2, colli- 
sional particles must be mapped to gravitational type zero. 

2 . 4 . 1 . TreePM short-range forces 

In the hybrid TreePM mode, stock applies a Gaussian 
low-pass filter to the fc-space Newtonian potential. This 
permits highly accurate and rapid computation of a long- 
range Coloumb force with the PM algorithm. On spatial 
scales near a user-specified Osmth number of mesh cells, the 
short-range force is calculated by smoothly transitioning 
to a partial tree walk using a suitably adjusted potential 

[1 _ exp {-erl)] } (8) 

and taking a radial derivative, where denotes the 

3D inverse Fourier transform, and rg = 2'Kaamth/L. It 
is a convenient coincidence that, in the Newtonian case, 
this Fourier transform yields (^®^°‘'*(r) = (j){r)eric{r/2rg). 
Stock samples a user-specified number A^tab of this factor 
and its derivative, and then multiplies the computed tree 
potential and force by these respective modulations. For 
the case of a general force-law, we consider instead 

ott r 

= / ^;;M4(fc)exp(-fcV^)} dr' 

(9) 

where we have performed the angular integrations of the 
transform. We have also well-conditioned the computation 
by taking a derivative with respect to r, and writing 

( 10 ) 

which is just f'ijik) normalized by the Newtonian Greens’ 
function. This lessens the severity of singularities in 4>ij{k) 
before computation due to our prior constraint of the . 

Subtractions, such as in Equation]^ which involve two 
values very near unity can suffer from loss of precision. We 
note, however, that the Fourier integrand of Equation [^ is 
effectively band-limited, so its values may be rapidly and 
precisely (to near machine precision) computed during ini¬ 
tialization of ngravs by an inverse FFT. The acceleration 
is found from differentiation of Equation]^ with respect to 
r 

b-TT r 

=aij{r) - — / {(j)ij{k) exp{-k^r^^)} dr' 

9'7r 

+ exp{-k‘^rl)} . ( 11 ) 
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To maintain requisite precision during the computation, 
the integration in r-space is performed using a 4-point 
Newton-Cotes formula, giving an error for the integration 
bounded by 

< - - - 5 ( 12 ) 

(IOA^tab^) 

where ^ G N is a user-specified parameter which controls 
the resolution of the FFT. The error bound in Equation [T^ 
follows from application of standard inequalities and our 
constraint of the a^. 


2.5. Limitations 

Stock computes the non-periodic fc-space Greens’ func¬ 
tion through explicit FFT of the sampled r-space poten¬ 
tial on a mesh of twice the dimension desired for use in 
simulation. Unfortunately, generalization of this proce¬ 
dure from the specification of the periodic fc-space Greens’ 
function as is done in the accurate computation of the 
truncated short-range force is not practical due to mem¬ 
ory constraint. Similarly, sampling the transformed radial 
function enough to guarantee the requisite accuracy on all 
points of the lattice through cubic interpolation encounters 
similar memory constraint. Since non-periodic computa¬ 
tions may be performed to unlimited dynamic range with 
the extended Tree algorithm, and since large-scale inves¬ 
tigations of modified force laws would naturally proceed 
investigations on smaller scales, we do not believe this to 
be a serious omission. 

In gadget-2, collisional forces are only computed for 
one type of “gas” particle. As highlighted by Marri & 
White P7]. distinct gas species can allow the effective cap¬ 
ture of a broad range of physical phenomena involved in 
galaxy formation and evolution. While it is possible to 
alter the gravitational force laws involving the gas, ngravs 
does not implement multiple gas species. This choice was 
one of simplicity and further extension to multiple gas 
species, i.e. following Scannapieco et. aZ.[55j, is straight¬ 
forward and could form the basis of future work. 

At present, particle interactions under force laws with 
mass dependent scale are only accurately computed for 
uniformly massed gravitational species. This applies to 
both active and passive mass dependence. Without uni¬ 
form masses, accurate computation is only possible only in 
pure tree, non-periodic mode. This is due to the Fourier 
computation’s use of a single momentum-space Greens’ 
function for all contributing densities, and to the neces¬ 
sary precomputation of force and potential correction ta¬ 
bles for direct infinite lattice contributions. Even in non¬ 
periodic pure tree mode, due to the monopole averaging 
procedure, considerable force errors could result given situ¬ 
ations where a node contains comparable numbers of same- 
species particles with different masses. The efficient and 
accurate computation of such interactions is an open ques¬ 
tion. 


Figure 2: Fraction of forces computed during entire simulation 

with force error in excess of A///, for Pure Tree mode and non¬ 
periodic boundary conditions, with D = 3 ngravs (blue, dashed). All 
interactions are Newtonian to permit direct comparison with stock 
(magenta, solid). Residuals |A| from stock shown at top. Note 
slightly improved force accuracy for ngravs due to more detailed 
characterization of the mass distribution. 
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3. Test problems 

Since GADGET-2 has been well vetted over the past 
decade, to verify the correctness of the ngravs extension, 
it suffices to demonstrate that ngravs with Newtonian in¬ 
teractions and stock agree in their common operational 
modes. In addition to these consistency checks, we also 
investigate the TreePM and Pure Tree algorithms under 
more general force laws. 

Profiling results presented in Table[2and Figure[2were 
carried out to double precision on a dedicated machine us¬ 
ing 32 of 48 available cores for computation with 94 giga¬ 
bytes of RAM. Periodic profiling and force accuracy runs 
were performed at double precision with a 128 point mesh. 
TreePM transition studies were performed on a 256 point 
mesh. Simulation initial conditions are those included with 
stock, with particle type reassigned as necessary to test 
ngravs. Our parameters can be found in Appendix |Ap-| 


3.1. Comparison with stock 

We present force accuracy comparisons between stock 
and D = 3 ngravs for two of the four stock included 
test initial conditions. In Figure we demonstrate con¬ 
sistency with stock for pure tree operation under non¬ 
periodic boundary conditions. In Figure we demon¬ 
strate consistency with stock for TreePM operation un¬ 
der periodic boundary conditions. This test verifies the 
correctness of the lattice summation, as the direct force 
computation requires the lattice correction under periodic 
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Figure 3: Fraction of forces computed during entire simulation with 
force error in excess of A///, for TreePM mode and periodic bound¬ 
ary conditions, with D = 3 ngravs (blue, dashed). All interactions 
are Newtonian to permit direct comparison with stock (magenta, 
solid). Residuals |A| from stock shown at top. Note that an erro¬ 
neous softening scale of 600kpc in the default stock ACDM initial 
condition was reduced to 50kpc to keep the softening scale below the 
cutoff for use of the PM computation. 
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Figure 4: Force error as a function of separation for the Coloumb 
interaction with periodic boundaries. Stock is shown in red (grey), 
and ngravs D = 1 m blue (black). Errors stacked from ten distinct 
simulations of a randomly placed massive source interacting with ran¬ 
domly placed test particles. Vertical lines represent the mesh scale 
(dotted) and transition scale (dashed). Binned rms shown at top, 
with a = 0.005 the specified error tolerance for the tree algorithm. 
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boundary conditions. This periodic test also includs a col- 
lisional “gas” species, which further verifies the integrity 
of the unmodified collisional code. All figures display the 
fraction of force computations with force error 


A/// = 


Fa.lg — Fj ^2 
Fn2 


(13) 


in excess of that fraction. Here, subscripts “alg” and 
N'^ represent computation by tree/mesh routines and di¬ 
rect Newtonian summation, respectively. The collapsing 
gas sphere was excluded as the present implementation of 
ngravs requires that all collisional particles be of the same 
gravitational species. Note that the force accuracies are 
virtually identical; favorable residuals indicate relatively 
minor improvements due to three multipole moments per 
node. 

In addition to characterizing errors across entire sim¬ 
ulations, we have also investigated the detailed error be¬ 
havior of the TreePM algorithm in the transition region. 
We perform this test by creating a sequence of initial con¬ 
ditions each with a randomly placed massive source and 
shells of ^ 5000 randomly placed test particles. The test 
particle density drops as 1/r^ so that the number of in¬ 
teractions per shell is roughly constant. All particles have 
zero initial velocities and we consider only the first force 
computation, recording force errors for all interactions. 
The results of 10 of these runs are then stacked. Per¬ 


formance with the Newtonian interaction can be seen in 
Figure where the force error behavior is indistinguish¬ 
able from that of stock. 

3.2. The pure Yukawa interaction 

To verify correct and accurate behavior of the TreePM 
transition algorithm with more general forces, we have im¬ 
plemented the pure Yukawa (Yukawa) interaction. The 
Yukawa interaction represents a pathological “edge case” 
for the fc-space Gaussian low-pass filter approach, because 
its r-space behavior is already exponentially suppressed 
before filtering. Conveniently, there exists significant lit¬ 
erature on lattice sums involving the Yukawa potential, 
and our reference lattice implementation follows that of 
Salin and Caillol[29]. By comparison with a box filter, it 
was determined that the tree contribution remained non- 
negligible relative to the PM contribution under the Gaus¬ 
sian filter. Fortunately, we found that this pathology, spe¬ 
cific to the Gaussian filter, can be corrected by simply scal¬ 
ing the Yukawa A:-space Greens’ function by exp(—y^r^) 
where ym is the Yukawa field mass and is the filter tran¬ 
sition scale. 

In Figure we demonstrate the error performance for 
dimensionless ym G {10,50}, with 50 sufficient to give 
eventual 10 x suppression of the Coloumb potential over 
the transition region. Note that the rms error is essen¬ 
tially unchanged throughout the transition region. The 
rms error at large r increases with increasing ym due to 
the presence of additive exponential terms in the reference 


7 

































Figure 5: Force error as a function of separation for the Yukawa 
interaction with periodic boundaries. Yukawa field masses ym = 10 
(blue, black) and ym = 50 (pink, grey) shown. Errors for each 
field mass stacked from ten distinct simulations between a randomly 
placed massive source, with randomly placed test particles. Vertical 
lines represent the mesh scale (dotted) and transition scale (dashed). 
Binned rms shown at top, with a = 0.005 the specified error tolerance 
for the tree algorithm. 
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lattice sum and subsequent loss of precision. We note that 
this force test can be performed accurately in ngravs be¬ 
cause the source and test masses can be assigned distinct 
gravitational types. Intra-type interactions can then be 
turned off completely. This is relevant for proper inves¬ 
tigation of the pure Yukawa interaction, as exponential 
suppression makes nearer neighbors relevant, even with a 
very massive source. 

In addition to pure Yukawa, we also have explored an 
evenly weighted sum of Yukawa and Coloumb to verify 
that our correction factor to Yukawa is robust. We find 
that the error performance is essentially unchanged from 
that of Coloumb, which verifies appropriate behavior in 
the transition region of the summed force. 


3.3. The accumulator 

To verify the newly introduced optional tracking of 
contributing particle counts iV_L in tree computations, we 
introduce a gravitational species where N±^ can be used to 
give an exact correction to the monopole approximation of 
the force laws. This test involves two species: species zero 
interacts via the usual Newtonian interaction 


‘^oo{r) 


Mpmo 

r 


(14) 


where we use uppercase to denote the active gravitational 
mass and lowercase to denote the passive mass. The sec¬ 
ond species, denoted one, is characterized by a dimension¬ 


less scale j3 and interacts as 




2MiTOi 

7rr 


tan ^ 


/ 47r/?r \ 

\Mi/iV_L +'mi) ' 


(15) 


Note that the interaction approaches Newton’s at large r, 
but softens to a constant as r —>• 0. This is the exact po¬ 
tential between two hypothetical, spherically symmetric, 
cored densities of the following form 


p(r, M) 


1 

([M/2/37r]^ + r2)' 


(16) 


where M is the total mass enclosed over all space by an 
object with density given by Eqn. 16 Interactions across 
species are of the same functional form as Eqn. |15| apart 
from the softening scale, which continues to be set by the 
cored object 


^io{r) 

$oi(?’) 


2Mpmi 

TTr 

2Mimo 

TTr 


tan ^ 
tan”^ 


/ 47r/3fV_Lr 

V mi 


/ 47r/3fVj_r\ 

■ 


(17) 

(18) 


Note that Newton’s third law is not violated, the distinct 
Eqns. [T7| and [T8| distinguish between the passive and ac¬ 
tive mass for correct computation. One may think of 
this interaction as a preference for phenomenological cored 
halos [HllSO] motivating a new class of hypothetical object, 
the Massive Astronomical Halo (Mahalo). The results in 
Eigure exhibit the force accuracy achieved through this 
novel N± feature. 


4. Conclusion 

We have detailed the ngravs extension to the massively 
parallel hybrid Tree and mesh A^-body code GADGET-2, 
which now permits consideration of gravitational in¬ 
teractions between D particle species. Periodic simula¬ 
tions can proceed in either Tree or hybrid TreePM mode, 
while non-periodic simulations may be run in Tree mode. 
Our implementation vectorizes over the existing monopole 
moments within the Barnes-Hut tree, and distinguishes 
between active and passive gravitational mass during the 
mesh computations. Memory consumption remains favor¬ 
able: particle data storage is unchanged from GADGET-2, 
tree storage is increased by ^ 0.3(11—1), and Pourier stor¬ 
age requirements are unchanged. We subject the code to 
numerous tests to gauge performance both in runtime and 
in force accuracy. We verify that runtime is dominated by 
tree performance and scales as kD for k G (1,1.43) rela¬ 
tive to that of gadget-2. We find qualitatively identical, 
slightly improved, force accuracies compared to GADGET- 
2: behavior expected from our particular implementation. 
We also have introduced and verified a novel feature, which 
tracks the number of contributing particles of all species 
to any given monopole approximation, which can then be 
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Figure 6: Internal check of ngravs Nj_ feature. Force accuracy 

comparison in pure tree mode between two distinct gravitationally 
interacting species: one Newtonian, the other as described with /3 = 
1.31 X 10“^. Note that force accuracy comparable to GADGET-2 is 
maintained when the number of particles contributing to monopole 
approximations, A^_l, is tracked (yellow, solid) and used to adjust the 
force. Without such tracking, the force errors increase by an order 
of magnitude (orange, dashed). 
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used to correct exotic force laws with dynamic softening 
lengths. We believe that the ngravs extension will facil¬ 
itate investigation and constraint of exotic gravitational 
scenarios and have released ngravs publicljQ to the re¬ 
search community. 
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Appendix A. Initial conditions and simulation pa¬ 
rameters 

For completeness, we characterize the initial conditions 
and paired simulation parameters packaged with GADGET- 


^The website for downloading ngravs is 
https://github.eom/kcroker/Gadget-2.0.7-ngravs 


Table A.2: Initial conditions and parameters for the Pure Tree test 
case, e is the softening length. 


Collision of 2 spiral galaxies 


\x^\ 

< 200 kpc 

\v^\ 

< 350 km/s 

Type #I 

Collisionless 


A = 4 X 10^ 


m = 1.05 X 10-3 Mq 


e = 1 

Type #2 

Collisionless 


A = 2 X 10^ 


m = 2.33 X 10-4 Mq 


e = 0.4 


Table A.3: Initial conditions and parameters for the TreePM test 
case. Cosmological fractions and Hubble parameter are the stan¬ 
dard values, e is both the comoving and physical softening length, 
which was changed from an erroneous default value that caused the 
softening scale to overlap the PM transition region. 


ACDM Universe from ^ = 10 


\Xi\ 

< 5 X 104 kpc 

kil 

< 103 km/s 

Type #1 

Collisional 

A = 243 
m = 4.24 Mq 
e = 50 

Type #2 

Collisionless 

A = 243 
m = 27.5 Mq 
e = 50 


2, which we have used to test and benchmark ngravs. 
These initial conditions were chosen for convenience and 
consistency, and they exercise the full range of modified 
functionality within ngravs. For simulations involving D > 
1, collisionless particles were distributed evenly into the re¬ 
maining types and assigned identical parameters. 
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